Predicting Severity of Voice Disorder from DNN-HMM Acoustic Posteriors

نویسندگان

  • Tan Lee
  • Yuanyuan Liu
  • Yu Ting Yeung
  • Thomas K. T. Law
  • Kathy Y. S. Lee
چکیده

Acoustical analysis of speech is considered a favorable and promising approach to objective assessment of voice disorders. Previous research emphasized on the extraction and classification of voice quality features from sustained vowel sounds. In this paper, an investigation on voice assessment using continuous speech utterances of Cantonese is presented. A DNN-HMM based speech recognition system is trained with speech data of unimpaired voice. The recognition accuracy for pathological utterances is found to decrease significantly with the disorder severity increasing. Average acoustic posterior probabilities are computed for individual phones from the speech recognition output lattices and the DNN soft-max layer. The phone posteriors obtained for continuous speech from the mild, moderate and severe categories are highly distinctive and thus useful to the determination of voice disorder severity. A subset of Cantonese phonemes are identified to be suitable and reliable for voice assessment with continuous speech.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Acoustic Assessment of Disordered Voice with Continuous Speech Based on Utterance-Level ASR Posterior Features

Most previous studies on acoustic assessment of disordered voice were focused on extracting perturbation features from isolated vowels produced with steady-state phonation. Natural speech, however, is considered to be more preferable in the aspects of flexibility, effectiveness and reliability for clinical practice. This paper presents an investigation on applying automatic speech recognition (...

متن کامل

Low-rank Representation of Nearest Neighbor Phone Posterior Probabilities to Enhance Dnn Acoustic Modeling

We hypothesize that optimal deep neural networks (DNN) class-conditional posterior probabilities live in a union of lowdimensional subspaces. In real test conditions, DNN posteriors encode uncertainties which can be regarded as a superposition of unstructured sparse noise over the optimal posteriors. We aim to investigate different ways to structure the DNN outputs by exploiting low-rank repres...

متن کامل

Low-Rank Representation of Nearest Neighbor Posterior Probabilities to Enhance DNN Based Acoustic Modeling

We hypothesize that optimal deep neural networks (DNN) class-conditional posterior probabilities live in a union of lowdimensional subspaces. In real test conditions, DNN posteriors encode uncertainties which can be regarded as a superposition of unstructured sparse noise over the optimal posteriors. We aim to investigate different ways to structure the DNN outputs by exploiting low-rank repres...

متن کامل

Low-rank Representation for Enhanced Deep Neural Network Acoustic Models

Automatic speech recognition (ASR) is a fascinating area of research towards realizing humanmachine interactions. After more than 30 years of exploitation of Gaussian Mixture Models (GMMs), state-of-the-art systems currently rely on Deep Neural Network (DNN) to estimate class-conditional posterior probabilities. The posterior probabilities are used for acoustic modeling in hidden Markov models ...

متن کامل

Automatic Scoring of Shadowing Speech Based on DNN Posteriors and Their DTW

Shadowing has become a well-known method to improve learners’ overall proficiency. Our previous studies realized automatic scoring of shadowing speech using HMM phoneme posteriors, called GOP (Goodness of Pronunciation) and learners’ TOEIC scores were predicted adequately. In this study, we enhance our studies from multiple angles: 1) a much larger amount of shadowing speech is collected, 2) ma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016